# Evaluating Approximated Full Adders at 16nm

Pedro H. A. Silva and Cristina Meinhardt

Department of Informatics and Statistics, Federal University of Santa Catarina - UFSC

Florianópolis, Brazil

pedro.aquino@grad.ufsc.br, cristina.meinhardt@ufsc.br

Abstract—This work provides a comparison of a set of approximate full adder circuits in 16nm device technologies, with the goal of identifying how these designs behave in a specific environment compared to conventional exact adders, analyzing performance and power consumption. These parameters allow designers to compare the pros and cons of each inexact design, and evaluate the possible benefits in using them instead of exact adders, such as the mirror adder, in error tolerant applications. The results showed that one analyzed adder was not as energy efficient as expected. There were reductions of up to 85% in power consumption with the use of XNOR-based approximate adders, with the drawback of an increase in critical path timing. Also, it's possible to reduce up to 75% in power consumption and 25% in critical delay with the use of logically simplified CMOS adders.

Index Terms—approximate computing, full adder, low-power

#### I. INTRODUCTION

Approximate computing (AC) is an emerging research area capable of providing good results over energy savings [1]. It exploits the fact that many applications do not need correctness as a main requirement. In the last years, AC has been explored in hardware and software development for different contexts, including video and sound applications, Internet of Things devices, fault tolerance environment, computer vision, machine learning or sensor networks for example [2].

There are a lot of error-free computations where exploring AC enlarging the design space with the addition of quality metrics [3]. Some of the opportunities for approximate computing are applications that [3] [4]: (1) process noisy real-world data, such as those coming from sensors, for example Internet of Things applications; (2) final result must be perceived by the human senses, including many of the Inference and Vision problems; and (3) are based on inherently imprecise algorithms, in which the concept of correct result is replaced by a range of acceptable results such as recognition, data analytics and machine learning.

The main motivation to the development of AC solutions is the increased demand for low-power consumption designs [5]. Nowadays, in deep nanotechnology designs, battery life is a significant factor to be considered. Many applications involve a large number of arithmetic operations, exploring in depth the adding modules. The add operation is the main arithmetic function on computer systems and the base of the most commonly used arithmetic blocks. Thus, a digital system has the 1-bit full adder (FA) as one of the most critical basic blocks of an arithmetic unit. The performance of a full adder cell is a very vital point to be improved to achieve low power and fast operations of arithmetic block [6].

From the literature, many works explores the AC on arithmetic blocks at architectural or Register-Transfer Level (RTL) [3] [5] [7]. Few works investigate AC techniques applied to transistor level design of full adders. Thus, this work provides a comparison of a set of approximate full adder circuits at nanometer technology. The main goal is to identify how these designs behave in a specific environment compared to conventional exact adders, analyzing performance, power consumption and Power-Delay Product (PDP). This set of information contributes to designers to better understand the AC FA alternatives and to choose the most appropriate FA for a specific application.

### **II. APPROXIMATE ADDERS**

In this work, seven different full adder topologies were chosen and analyzed. Four of them are the traditional mirror adder (MA) implementation, in Fig. 1. and three approximations [8], namely Figs.2(a), 2(b) and 2(c). The others are a XNOR-based exact implementation [9] shown in Fig. 3(a), a XOR-based and a XNOR-based inexact full adders in Fig. 3(b) and 3(c), respectively [10].



Fig. 1: Mirror Adder (MA).

The Mirror CMOS architecture is considered the most traditional one, and was chosen as a base for comparison between the studied designs. It is composed by 24 transistors. These transistors are arranged in pull-up and pull-down networks, which are logically complementary. The main advantage of this architecture is that it provides a good conductibility, and



(a) Simplified Mirror Adder (SMA)

(b) Approximate MA 1 (AMA1)



Fig. 2: Approximated Adders based on Mirror Adder



Fig. 3: Approximated Adders based on XOR/XNOR FA

very good robustness when working with very small technologies and low voltages. However, the main disadvantages of CMOS 1bit FA are the high input capacitance and the impact of the pull-up network that makes the circuit slower [11].

The XNOR-based FA shown in Fig. 3(a) was elaborated with Pass-Transistor Logic (PTL) techniques with 10 transistors. This circuit was chosen as an example of low-power and area-efficient design for full adders [9] [10].

The adder approximations exploit the relaxation of numerical accuracy, and were designed with reduced logical complexity in order to lower transistor count and reduce power consumption. Thus, they present differences in their truth tables, shown in Table I, and failing inputs when compared to the exact implementation, given in Table II.

## III. METHODOLOGY

The present work focuses on the reduction in power consumption provided by the approximate adders in comparison to conventional exact full adder topologies, also observing the impact on delay. Thus, delay, power consumption and PDP are evaluated for each of the seven analyzed adders under nominal voltage.

The topologies are simulated using the model provided by Arizona State University, through Predictive Transistor Model

TABLE I: Truth tables for each approximate adder design

| INPUT       | E | XACT | S | SMA  | A | MA1  | A | MA2  | A | XA1  | A | XA2  |
|-------------|---|------|---|------|---|------|---|------|---|------|---|------|
| A B Cin     | S | Cout |
| 000         | 0 | 0    | 0 | 0    | 1 | 0    | 0 | 0    | 0 | 0    | 1 | 0    |
| 001         | 1 | 0    | 1 | 0    | 1 | 0    | 1 | 0    | 1 | 0    | 1 | 0    |
| 010         | 1 | 0    | 0 | 1    | 0 | 1    | 0 | 0    | 0 | 1    | 0 | 0    |
| 011         | 0 | 1    | 0 | 1    | 0 | 1    | 1 | 0    | 1 | 0    | 0 | 1    |
| $1 \ 0 \ 0$ | 1 | 0    | 0 | 0    | 1 | 0    | 0 | 1    | 0 | 1    | 0 | 0    |
| 101         | 0 | 1    | 0 | 1    | 0 | 1    | 0 | 1    | 1 | 0    | 0 | 1    |
| 110         | 0 | 1    | 0 | 1    | 0 | 1    | 0 | 1    | 0 | 1    | 1 | 1    |
| 111         | 1 | 1    | 1 | 1    | 0 | 1    | 1 | 1    | 1 | 1    | 1 | 1    |

TABLE II: Transistor and error count in the FA approximations

| Topology | Transistor count | Failing input      | Fail count |  |
|----------|------------------|--------------------|------------|--|
| SMA      | 16               | 010, 100           | 2          |  |
| AMA1     | 11               | 000, 010, 111      | 3          |  |
| AMA2     | 11               | 010, 011, 100      | 3          |  |
| AXA1     | 8                | 010, 011, 100, 101 | 4          |  |
| AXA2     | 6                | 000, 001, 100, 110 | 4          |  |

(PTM) at 16 nm bulk technology node [12]. The nominal supply voltage used was 0.7 V.

In order to characterize the adder designs, simulations in NGSPICE were carried out. The experiment consisted in extracting critical delay time and energy consumption to calculate power consumption and the Power-delay-Product (PDP) for each topology. A transient analysis is used to obtain the critical delay time and energy consumption, applying the definitions of propagation delay time and the energy consumed definition [13].

The average power consumption is obtained by the division between energy consumption and the total simulation time. The PDP is the product between power consumption and critical delay.

All transistors were sized based on the MOSIS CMOS scalable rules [13]. Each transistor has a channel length L = 16nm and channel width of NMOS transistor Wn = 32nm, and PMOS Wp = 64nm. For the analysis of the circuits, two inverters were used in each input and two inverters (fan-out-of-2) were used as load in order to emulate a more realistic scenario [13].

Additionally, the two XNOR-based adders required the insertion of a boost supply voltage in order to be properly analyzed, as their output presented excessive noise. The boost source is connected to the input and the supply voltage adopted is 0.9V, which resulted in an extra power consumption of  $5.96\mu$ W for the exact adder, and  $2.36\mu$ W for the AXA2.

This work consists of two steps: the first being the logical validation of the arrangements, and the second the extraction of data: delay, energy consumption and power of each circuit. The first step is done by implementing the circuits and applying stimuli in order to ensure proper functioning, according to the truth tables presented in Table I. The second step requires the definition of the transition arcs of the truth table. This can be considered when the output changes (high to low or the contrary) and just one input changes. There are two outputs to each of the circuits, as each topology provide sum and carry outputs, and thus there were different transition arcs for different approximations, being defined, in total, 14 transition arcs.

The different truth tables require different arcs and therefore total simulation times, ranging from 12ns up to 25ns in the arcs relating to the sum output, and from 12ns to 13ns in the arcs referring to the carry output. The comparison was thus made according to critical delay, considering both sum and carry simulations; power consumption, accounting for the largest between the ones calculated for sum and carry; and finally PDP. The period for all simulation arcs was kept at 1ns.

### **IV. RESULTS**

The evaluation considers the electrical behavior of the circuits at nominal operation, starting with the exact MA and its approximations, and moving to the XNOR-based exact adder and its approximations. After that, a comparison between the exact topologies and their respective approximations is done, and finally between all circuits, accurate and inaccurate and the conventional MA implementation. The results are summarized in Table III.

TABLE III: Electrical summary

| Topology     | Critical Delay (ps) | Power<br>(nW) | PDP (aJ) |
|--------------|---------------------|---------------|----------|
| mirror adder | 44.35               | 485.51        | 21.53    |
| SMA          | 34.57               | 325.68        | 11.26    |
| AMA1         | 31.18               | 290.97        | 9.07     |
| AMA2         | 32.60               | 123.95        | 4.04     |
| EXA          | 324.41              | 262.61        | 85.14    |
| AXA1         | 235.90              | 989.53        | 233.43   |
| AXA2         | 194.87              | 70.48         | 13.73    |

The data was normalized for better comparison using the conventional MA and the exact XNOR-based adder as base in each of the following two sections, respectively.

## A. Comparison with Accurate MA

All studied designs aim to provide some form of improvement when compared to the conventional MA implementation.

In terms of power consumption, the best results were obtained in the AXA2 design (70.48nW) with a decrease of 85.48% when compared to the consumption of the conventional MA (485.51nW), as shown in Fig. 4. When considering only the designs based on the accurate MA (SMA, AMA1 and AMA2), the best case was found in AMA2 (123.95nW), a reduction of 74.46% when compared to the exact design. Particularly, the AXA1 showed an increase of  $1.03 \times$  compared to the MA.



Fig. 4: Power Consumption Gain Compared to MA.

The critical delay was lowest in the AMA1 design (31.18ps), 70.30% that of the exact MA (44.35ps). All three XOR/XNOR based adders had considerably worse results, one order of magnitude larger than the MA and its approximations, with the best case being the AXA2 (194.87ps),  $4.40 \times$  the delay of the conventional MA.

As for the PDP, the lowest result was found in AMA2 (4.04aJ), being 18.77% the PDP of the exact MA (21.53aJ). Among the XOR/XNOR based designs evaluated, the best case was found in the AXA2 design (13.73aJ), 63.80% the PDP of the conventional MA.

### B. XOR/XNOR-based approximations

In terms of power consumption for each one of the XOR/XNOR approximations (AXA1 and AXA2), and using the nominal supply voltage, AXA2 has the lowest power consumption of 70.48nW, with a reduction of up to 73.16% in comparison with the XNOR-based exact adder, which has a power consumption of 262.61nW, as Fig. 5 shows.



Fig. 5: Inexact Adders Compared to Exact XNOR FA.

The critical delay was also lowest in AXA2 (194.87ps), a decrease of 39.89% when compared to the exact XNOR adder (324.19ps), also shown in Fig. 5.

The lowest value for PDP was thus obtained in AXA2 (13.73aJ), 83.87% less than the exact XNOR adder (85.14aJ).

The AXA1 implementation only showed improvement in the critical delay (233.59ps), with a reduction of 27.23% when compared to the exact XNOR adder. The power consumption was  $2.77 \times$  that of the exact adder, and the PDP was increased by  $1.74 \times$ .

## V. CONCLUSION

Since there are various transistor arrangements for full adders, and given that they are critical circuits in many applications that are error tolerant, it is important we understand the behaviour, pros and cons in each design, in particular when considering the use of inexact and logically simplified adders.

In this work, seven different adders were analyzed. Two of them are exact adders, and the other five adopt approximated computing. Most of them showed improvement in power consumption and PDP over the conventional MA. There was also reduction in critical delay in the approximate designs when compared to the source exact adder, that is, the SMA, AMA1 and AMA2 and the conventional MA, and the AXA1 and AXA2 and the exact XNOR adder. The simplified designs could reach over 85% reduction in power consumption (AXA2), nearly 30% reduction in critical delay (AMA1), and up to 80% decrease in PDP (AXA2).

It is noted that while the XOR/XNOR-based adders showed reduction in power consumption, they needed boost voltages in order to generate signals with acceptable noise levels, which increased their overall power consumption, while still having the drawback of larger critical path times. These facts make the best actual power consumption be observed in the AMA2 design, which presented up to  $\sim$ 75% reduction in power consumption and a decrease of  $\sim$ 25% in critical delay.

Notably, there was no improvement observed in energy efficiency in the AXA1 design whatsoever, demonstrating that using circuits that are logical simplifications will not necessarily result in reduction of power consumption.

As future work, this project will investigate the impact of adopting the approximated full adders on error-free applications, starting with cases of hardware designs to process noisy real-world data and video applications.

## ACKNOWLEDGMENT

This work was financed in part by National Council for Scientific and Technological Development CNPq and the Propesq/UFSC.

#### REFERENCES

- J. Han. Introduction to approximate computing. In 2016 IEEE 34th VLSI Test Symposium (VTS), pages 1–1, April 2016.
- [2] T. Moreau, A. Sampson, and L. Ceze. Approximate computing: Making mobile systems more efficient. *IEEE Pervasive Computing*, 14(2):9–13, Apr 2015.
- [3] A. G. M. Strollo and D. Esposito. Approximate computing in the nanoscale era. In 2018 International Conference on IC Design Technology (ICICDT), pages 21–24, June 2018.
- [4] D. Marwaha and A. Sharma. A review on approximate computing and some of the associated techniques for energy reduction in IoT. In 2018 2nd International Conference on Inventive Systems and Control (ICISC), pages 319–323, Jan 2018.
- [5] M. Osta, A. Ibrahim, H. Chible, and M. Valle. Approximate multipliers based on inexact adders for energy efficient data processing. In 2017 New Generation of CAS (NGCAS), pages 125–128, Sep. 2017.
- [6] Aminul Islam, M. W. Akram, Ale Imran, and Mohd. Hasan. Energy efficient and process tolerant full adder design in near threshold region using FinFET. In *Proceedings of the 2010 International Symposium* on Electronic System Design, ISED '10, pages 56–60, Washington, DC, USA, 2010. IEEE Computer Society.
- [7] M. Ha and S. Lee. Multipliers with approximate 42 compressors and error recovery modules. *IEEE Embedded Systems Letters*, 10(1):6–9, March 2018.
- [8] Vaibhav Gupta, Debabrata Mohapatra, Sang Phill Park, Anand Raghunathan, and Kaushik Roy. Impact: Imprecise adders for low-power approximate computing. *IEEE/ACM International Symposium on Low Power Electronics and Design*, pages 409–414, 2011.
- [9] S. Mohanraj and M. Maheswari. SERF and modified SERF adders for ultra low power design techniques. *Procedia Engineering*, 30:639 – 645, 2012. International Conference on Communication Technology and System Design 2011.
- [10] Zhixi Yang, Ajaypat Jain, Jinghang Liang, Jie Han, and Fabrizio Lombardi. Approximate XOR/XNOR-based adders for inexact computing. 2013 13th IEEE International Conference on Nanotechnology (IEEE-NANO 2013), pages 690–693, 2013.
- [11] Keivan Navi, Omid Kavehei, Mahnoush Rouholamini, Amir Sahafi, Shima Mehrabi, and Nooshin Dadkhahi. Low-power and highperformance 1-bit CMOS full adder cell. *Journal of Computers - JCP*, 3:48–54, 02 2008.
- [12] R. Zimmermann and W. Fichtner. Low-power logic styles: CMOS versus pass-transistor logic. *IEEE Journal of Solid-State Circuits*, 32(7):1079– 1090, July 1997.
- [13] Neil Weste and David Harris. CMOS VLSI Design: A Circuits and Systems Perspective. Addison-Wesley Publishing Company, USA, 4th edition, 2010.